摘要 :
In this paper, we investigate solutions relying on data partitioning schemes for parallel building of OLAP data cubes, suitable to novel Big Data environments, and we propose the framework OLAP*, along with the associated benchmar...
展开
In this paper, we investigate solutions relying on data partitioning schemes for parallel building of OLAP data cubes, suitable to novel Big Data environments, and we propose the framework OLAP*, along with the associated benchmark TPC-H*d, a suitable transformation of the well-known data warehouse benchmark TPC-H. We demonstrate through performance measurements the efficiency of the proposed framework, developed on top of the ROLAP server Mondrian.
收起
摘要 :
This paper reports two projects for supporting decisions of the Company of Electricity in Azores Islands, Electricidade dos Acores. There were several decisions to support, such as whether communications between islands should mov...
展开
This paper reports two projects for supporting decisions of the Company of Electricity in Azores Islands, Electricidade dos Acores. There were several decisions to support, such as whether communications between islands should moved from the present telephone lines to VoIP, and if better models to support forecast power consumption should be adopted. The solution established integrates OLAP cubes in a data mining project, based on CRISP-DM process model, Both for strategic and more operational decisions the objective was always to get accurate data, build a data warehouse and to get tools to analyze it in order to properly inform the decision makers. These DSS's translates big CSV flat files or acquire data in real time from operational Data Bases to update a data warehouse, including importing, evaluating data quality and populating relational tables. Multidimensional data cubes with numerous dimensions and measures were used for operational decisions and as exploration tools in the strategic ones. Data mining models for forecasting, clustering, decision trees and association rules identified several inefficient procedures and even fraud situations. Not only was possible to support the necessary decisions, but several models were also displayed so that control decision makers and strategists could support new problems.
收起
摘要 :
The prediction of future energy consumption of buildings based on historical performances is an important approach to achieve energy efficiency. A simulation method is here introduced to obtain sufficient clean historical consumpt...
展开
The prediction of future energy consumption of buildings based on historical performances is an important approach to achieve energy efficiency. A simulation method is here introduced to obtain sufficient clean historical consumption data to improve the accuracy of the prediction. The widely used statistical learning method, Support Vector Machines (SVMs), is then applied to train and to evaluate the prediction model. Due to the time-consuming problem of the training process, a parallel approach is applied to improve the speed of the training of large amounts of data when considering multiple buildings. The experimental results show very good performance of this model and of the parallel approach, allowing the application of Support Vector Machines on more complex problems of energy efficiency involving large datasets.
收起
摘要 :
The prediction of future energy consumption of buildings based on historical performances is an important approach to achieve energy efficiency. A simulation method is here introduced to obtain sufficient clean historical consumpt...
展开
The prediction of future energy consumption of buildings based on historical performances is an important approach to achieve energy efficiency. A simulation method is here introduced to obtain sufficient clean historical consumption data to improve the accuracy of the prediction. The widely used statistical learning method, Support Vector Machines (SVMs), is then applied to train and to evaluate the prediction model. Due to the time-consuming problem of the training process, a parallel approach is applied to improve the speed of the training of large amounts of data when considering multiple buildings. The experimental results show very good performance of this model and of the parallel approach, allowing the application of Support Vector Machines on more complex problems of energy efficiency involving large datasets.
收起
摘要 :
While parallel architectures based on clusters of Processing Elements (PEs) sharing L1 memory are widespread, there is no consensus on how lean their PE should be. Architecting PEs as vector processors holds the promise to greatly...
展开
While parallel architectures based on clusters of Processing Elements (PEs) sharing L1 memory are widespread, there is no consensus on how lean their PE should be. Architecting PEs as vector processors holds the promise to greatly reduce their instruction fetch bandwidth, mitigating the Von Neumann Bottleneck (VNB). However, due to their historical association with supercomputers, classical vector machines include microarchitectural tricks to improve the Instruction Level Parallelism (ILP), which increases their instruction fetch and decode energy overhead. In this paper, we explore for the first time vector processing as an option to build small and efficient PEs for large-scale shared-L1 clusters. We propose Spatz, a compact, modular 32-bit vector processing unit based on the integer embedded subset of the RISC-V Vector Extension version 1.0. A Spatz-based cluster with four Multiply-Accumulate Units (MACUs) needs only 7.9 pJ per 32-bit integer multiply-accumulate operation, 40% less energy than an equivalent cluster built with four Snitch scalar cores. We analyzed Spatz’ performance by integrating it within MemPool, a large-scale many-core shared-L1 cluster. The Spatz-based MemPool system achieves up to 285 GOPS when running a 256 × 256 32-bit integer matrix multiplication, 70% more than the equivalent Snitch-based MemPool system. In terms of energy efficiency, the Spatz-based MemPool system achieves up to 266 GOPS/W when running the same kernel, more than twice the energy efficiency of the Snitch-based MemPool system, which reaches 128 GOPS/W. Those results show the viability of lean vector processors as high-performance and energy-efficient PEs for large-scale clusters with tightly-coupled L1 memory.
收起
摘要 :
On-line analytical processing (OLAP) applications require high performance database support to achieve good response time (crucial for decision making). Database clusters provide a cost-effective alternative to parallel database s...
展开
On-line analytical processing (OLAP) applications require high performance database support to achieve good response time (crucial for decision making). Database clusters provide a cost-effective alternative to parallel database systems. For OLAP applications, that typically use heavy weight queries, intra-query parallelism yields better performance as it reduces the execution time of individual queries. Intra-query parallelism is based on processing the same query on different subsets of the query table. Combining physical and virtual partitioning to define table subsets provides flexibility in intra-query parallelism while optimizing disk space usage and data availability. Experiments with our partitioning technique using TPC-H benchmark queries on a 32-dual node cluster gave linear and super-linear speedup, thereby reducing significantly the time of typical OLAP heavy weight queries.
收起
摘要 :
On-line analytical processing (OLAP) applications require high performance database support to achieve good response time (crucial for decision making). Database clusters provide a cost-effective alternative to parallel database s...
展开
On-line analytical processing (OLAP) applications require high performance database support to achieve good response time (crucial for decision making). Database clusters provide a cost-effective alternative to parallel database systems. For OLAP applications, that typically use heavy weight queries, intra-query parallelism yields better performance as it reduces the execution time of individual queries. Intra-query parallelism is based on processing the same query on different subsets of the query table. Combining physical and virtual partitioning to define table subsets provides flexibility in intra-query parallelism while optimizing disk space usage and data availability. Experiments with our partitioning technique using TPC-H benchmark queries on a 32-dual node cluster gave linear and super-linear speedup, thereby reducing significantly the time of typical OLAP heavy weight queries.
收起
摘要 :
The kernel methods play a pivotal role in machine learning algorithms. Unfortunately, working with the kernel methods must deal with an n 脳 n kernel matrix, which is memory intensive. In this paper, we present a parallel, approxi...
展开
The kernel methods play a pivotal role in machine learning algorithms. Unfortunately, working with the kernel methods must deal with an n 脳 n kernel matrix, which is memory intensive. In this paper, we present a parallel, approximate matrix factorization algorithm, which loads only essential data to individual processors to enable parallel processing of data. Ourmethod reduces space requirement for the kernel matrix from O(n2) toO(np/m), where n is the amount of data, p the reduced matrix dimension (p 芦n), and m the number of processors.
收起
摘要 :
We present the design of a global object space in a distributed Java Virtual Machine that supports parallel execution of a multi-threaded Java program on a cluster of computers. The global object space virtualizes a single Java ob...
展开
We present the design of a global object space in a distributed Java Virtual Machine that supports parallel execution of a multi-threaded Java program on a cluster of computers. The global object space virtualizes a single Java object heap across machine boundaries to facilitate transparent object accesses. Based on the object connectivity information that is available at runtime, the object reachable from threads at different nodes, called a distributed-shared object, are detected With the detection of distributed-shared objects, we can alleviate overheads in maintaining the memory consistency within the global object space. Several runtime optimization methods have been incorporated in the global object space design, including an object home migration method that reallocates the home of a distributed-shared object, synchronized method migration that allows the remote execution of a synchronized method at the home node of its synchronized object, and object pushing that uses the object connectivity information to improve access locality.
收起
摘要 :
We present the design of a global object space in a distributed Java Virtual Machine that supports parallel execution of a multi-threaded Java program on a cluster of computers. The global object space virtualizes a single Java ob...
展开
We present the design of a global object space in a distributed Java Virtual Machine that supports parallel execution of a multi-threaded Java program on a cluster of computers. The global object space virtualizes a single Java object heap across machine boundaries to facilitate transparent object accesses. Based on the object connectivity information that is available at runtime, the object reachable from threads at different nodes, called a distributed-shared object, are detected With the detection of distributed-shared objects, we can alleviate overheads in maintaining the memory consistency within the global object space. Several runtime optimization methods have been incorporated in the global object space design, including an object home migration method that reallocates the home of a distributed-shared object, synchronized method migration that allows the remote execution of a synchronized method at the home node of its synchronized object, and object pushing that uses the object connectivity information to improve access locality.
收起